Greg Detre
Monday, October 07, 2002
observable environment, restricted move-space, yet the games themselves are hard to play � will not be able to search the entire tree
Agenda
adversarial search
alpha-beta pruning to speed up search
evaluation functions to approximate search
improve heuristics
Antony: would you necessarily need to be able to see more moves ahead to beat Kasparov?
it depends on your pruning, and moreover, on your evaluation functions? so, assuming we can do more search than Kasparov, then he must be evaluating the board better
assume that pruning and evaluation can be pretty independent
www.brainsinbahrain.com
latest chess competition with the new grandmaster
he used his signature Berlin Wall formation to draw, because he knew that he was playing black and could (in theory) only draw
minimax
crucial assumption: that the opponent plays rationally
apparently, if your opponent plays sub-optimally and you assume they�re going to be rational, you can only do better than them (assuming you can search the whole tree)
alphabeta
alpha is the lower bound on the value of a max node
beta is the upper bound on the value of a min node
(i.e. the value of a min node is always going to be less than this � it can only get smaller)
used the alpha value to prune at the min node
so you use these to help prune the rest of your search
transposition tables
a transposition is a different permutation of a move sequence that results in the same position � just a hash table
tic-tac-toe
use a 2-ply search
one of our moves, and one of our opponent�s moves
a ply is just a level in a search tree
use symmetry � not enumerate all of the tree, e.g. the first level only has 3 moves
backgammon
Tesauro 1989 � best computer player in the world, computer learnt by playing copies of itself (rather than against an expert), uses a neural net
initially, Neurogammon used a NN, fed it expert advice � very tedious for the expert � 20,000 positions � only mediocre, could be beaten by its expert trainer
�foregoing tutelage of masters places us at a disdvantage but is also liberating � not hindered by biases or prejudices�
what it learned has led to a rewriting of the backgammon textbooks
similar to go, in that there are many board positions
search space is huge, because there are two dice (20 different moves, c. 20 different dice rolls, so 400 branching factor)
need randomness to perform search, and avoid pathologies
learn key concepts first
still 2-ply search
quiescence search
a fixed cut-off depth is not very robust, and can lead to problems (e.g. if there�s a great move one ply beyond your search)
horizon effect
appers from a limited depth of search � need a heuristic to evaluate very succesful moves that appear to be indefinitely postponed, but might just require a 20+ move �ladder�
iterative deepening
deep blue used iterative deepening � it had to move against the clock
when the clock runs out, we use the solution
you can�t trust alphabeta with any confidence unless the full breadth of the tree has been searched
2GHz PC � 200 million nodes/move, 3 minutes (approx 106 nodes/sec)
minimax approx 355 = c. 50million
alphabeta, get to approx 10 ply (expert level)
additional tricks, e.g. metareasoning (14 ply)
database and extension tuning, supercomputer, special hardware � DeepBlue does 126*106 nodes/sec)
branching factor, holism
you have databases
most of all though, chess has a point-scoring algorithm that works really well
chess uses searches � lower branching (you can prune very effectively down to about 14 moves sometimes)
whereas it�s really hard to evaluate your own position in go
and also depends a great deal on what your opponent does(???)
go doesn�t use search � they�re based mainly around shapes and rules
less easy to specify your goal state(???) � Niall says that chess programs basically just look for checkmates, whereas you�re balancing so many things with Go when thinking about goal states � for starters, I don�t know how hard it is to figure out when the game has actually ended
interestingly, it looks from James Luke�s talk as though they depended heavily on expert tweaking, rather than machine learning techniques, right???
alpha-beta vs minimax???
feature evaluation function???
transposition table??? a lookup table to prevent looping???
presumably you could use statistical methods to assign general values to different chess pieces???
but you need a pretty expressive way to represent the states to do that given a really big search space (e.g. chess), rather than just a big lookup table of chess boards
e.g. you break the board down into features (higher-dimensional representation of the state)
these features need to be easy to compute
often need an expert to identify useful features
Parkes: maybe the problem in Go is that there are lots of different features that you could use
if you just add the features up linearly, that�s easiest � but it doesn�t allow for interactions between the features (unless you expand the number of features, i.e. to include the interaction that you�re looking for)
temporal difference???